Authorship Verification, combining Linguistic Features and Different Similarity Functions

نویسندگان

  • Daniel Castro-Castro
  • Yaritza Adame Arcia
  • María Peláez Brioso
  • Rafael Muñoz
چکیده

Authorship analysis is an important task for different text applications, for example in the field of digital forensic text analysis. Hence, we propose an authorship analysis method that compares the average similarity of a text of unknown authorship with all the texts of an author. Using this idea, a text that was not written by an author, would not exceed the average of similarity with known texts and a text of unknown authorship would be considered as written by the author, only if it exceeds the average of similarity obtained between texts written by him and if it got the major value comparing the average similarity with the rest of the authors. For each linguistic feature we obtain a vote by majority using different functions and for the final decision we divide the number of votes for each feature that consider as written by the author the unknown text by the total of features analyzed. The results obtained for each language in the PAN 2015 authorship verification competition are exposed in the overview of the task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Authorship Identification in Large Email Collections: Experiments Using Features that Belong to Different Linguistic Levels - Notebook for PAN at CLEF 2011

The aim of this paper is to explore the usefulness of using features from different linguistic levels to email authorship identification. Using various email datasets provided by PAN’11 lab we tested several feature groups in both authorship attribution and authorship verification subtasks. The selected feature groups combined with Regularized Logistic Regression and One-Class SVMmachine learni...

متن کامل

Authorship Verification, Average Similarity Analysis

Authorship analysis is an important task for different text applications, for example in the field of digital forensic text analysis. Hence, we propose an authorship analysis method that compares the average similarity of a text of unknown authorship with all the text of an author. Using this idea, a text that was not written by an author, would not exceed the average of similarity with known t...

متن کامل

Linguistic Profiling for Authorship Recognition and Verification

A new technique is introduced, linguistic profiling, in which large numbers of counts of linguistic features are used as a text profile, which can then be compared to average profiles for groups of texts. The technique proves to be quite effective for authorship verification and recognition. The best parameter settings yield a False Accept Rate of 8.1% at a False Reject Rate equal to zero for t...

متن کامل

A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF)

This paper presents the performance evaluation of an authorship verification technique that is based on a modified version of General Impostors (GI) [2]. The novelties of this implementation are: 1. a modified way of combining the min-max similarity measure and, 2. a relatively large set of diverse features that spans letter-level, word-level, function word-level, word shape-level, and word tag...

متن کامل

Sub-Profiling by Linguistic Dimensions to Solve the Authorship Attribution Task

In this paper, we describe a modified version of the profile-based approach for the Authorship Attribution (AA) task of the PAN 2012 challenge. Our PAN system for AA utilizes the concept of linguistic modalities on profile-based (PB) approaches. We concatenate all the training documents from the same author and build author-specific sub-profiles, one per linguistic modality. Then instead of usi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015